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Abstract 


This report presents a deductive proof of a self-stabilizing distributed clock 
synchronization protocol. It is focused on the distributed clock synchronization of 
an arbitrary, non-partitioned digraph ranging from fully connected to 1- 
connected networks of nodes while allowing for differences in the network 
elements. This protocol does not rely on assumptions about the initial state of the 
system, and no central clock or a centrally generated signal, pulse, or message is 
used. Nodes are anonymous, i.e., they do not have unique identities. There is no 
theoretical limit on the maximum number of participating nodes. The only 
constraint on the behavior of the node is that the interactions with other nodes are 
restricted to defined links and interfaces. We present a deductive proof of the 
correctness of the protocol as it applies to the networks with unidirectional and 
bidirectional links. We also confirm the claims of determinism and linear 
convergence. 
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1. Introduction 


Synchronization algorithms are essential for managing the use of resources and controlling 
communication in a distributed system. Synchronization of a distributed system is the process 
of achieving and maintaining a bounded skew among independent local clocks. A distributed 
system is said to be self-stabilizing if, from an arbitrary state, it is guaranteed to reach a 
legitimate state in a finite amount of time and remain in a legitimate state. A legitimate state is a 
state where all parts in the system are in synchrony. The self-stabilizing distributed-system clock 
synchronization problem is, therefore, to develop an algorithm (i.e., a protocol) to achieve and 
maintain synchrony of local clocks in a distributed system after experiencing system-wide 
disruptions in the presence of network element imperfections. The convergence and closure 
properties address achieving and maintaining network synchrony, respectively. Hereafter in this 
report, we use the term synchronization to mean self-stabilizing clock synchronization in 
distributed systems. 

A thorough understanding of the synchronization of a distributed system has proven to be elusive 
for decades. The main challenges associated with distributed synchronization are the complexity 
of developing a solution and proving the correctness of these solutions. It is possible to have a 
solution that is hard to prove or refute. Such a solution, however, is not likely to be accepted or 
used in practical systems. The proposed solutions must restore synchrony and coordinated 
operations after experiencing system-wide disruptions in the presence of network element 
imperfections and, for ultra-reliable distributed system, in the presence of various faults. A fault 
is a defect or flaw in a system component resulting in an incorrect state [Gir 2005] [Tor 2005] 
[But 2008]. In addition, a proposed solution must be proven to be correct. If a mathematical 
proof is deemed difficult, at a minimum, the proposed solution must be shown to be correct using 
available fonnal methods techniques. Furthennore, addressing network element imperfections is 
necessary to make a solution applicable to realizable systems. 

In [Mai 2011 A] a solution is presented for an arbitrary network (digraph) in the absence of 
faults. The system under study is an arbitrary, non-partitioned digraph ranging from fully 
connected to 1 -connected networks of nodes while allowing for differences in the network 
elements. Some networks of interest include grid, ring, fully connected, bipartite, and star (hub). 
This solution does not require any particular information flow nor imposes changes (e.g., 
embedding a directed spanning tree or rewiring) to the network in order to achieve synchrony. 
The assumption of an absence of faults is equivalent to the assumption that all faults are 
detectable. This departure from our previous work at the Byzantine extreme of the fault 
spectrum [Mai 2006A] is in part because of the niche use and the extra cost associated with the 
Byzantine faults. Also, using authentication and error detection techniques, it is possible to 
substantially reduce the effects of variety of faults in the system. Furthermore, the classical 
definition of a self-stabilizing algorithm assumes generally that there are no faults in the system. 

In this report we present a deductive proof for the correctness claims of A Self-Stabilizing 
Distributed Clock Synchronization Protocol For Arbitrary Digraphs [Mai 201 1 A] and claims of 
determinism and linear convergence of the protocol with respect to the self-stabilization period. 
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A bounded model of the protocol was model checked by confirming that a set of candidate 
systems self-stabilized from any state [Mai 201 IB]. The model checking results of the bounded 
models of the protocol have validated the correctness of the protocol as they apply to the 
networks with unidirectional and bidirectional links. In addition, the results have confirmed the 
claims of determinism and linear convergence. 

This report is organized as follows. In Section 2 we provide a system overview. We present the 
protocol and its description in Section 3. We present a deductive proof of the correctness of the 
protocol in Section 4. We present concluding remarks in Section 5. 


2. System Overview 

We consider a system of pulse-coupled entities (e.g., oscillators, pacemaker cells) pulsating 
periodically at regular time intervals. These entities are said to be coupled through some 
physical means (wire or fiber cables, chemical process, or wirelessly through air or vacuum) that 
allows them to influence each other. We model the system as a set of nodes that represent the 
pulse-coupled entities and a set of communication links that represent their interconnectivity. 

The underlying topology considered is an arbitrary, non-partitioned digraph ranging from fully 
connected to 1 -connected network of K > 1 nodes that exchange messages through a set of 
communication li nk s. Nodes are anonymous, i.e., they do not have unique identities. All nodes 
are assumed to be good, i.e., actively participate in the synchronization process and correctly 
execute the protocol. The communication li nk s are assumed to be between distinct nodes. All 
communication l ink s are assumed to be good, i.e., reliably transfer data from their source nodes 
to their destination nodes. The nodes communicate with each other by exchanging broadcast 
messages. Broadcast of a message by a node is realized by transmitting the message, at the same 
time, to all nodes that are directly connected to it. The communication network does not 
guarantee any relative order of arrival of a broadcast message at the receiving nodes, that is, a 
consistent delivery order of a set of messages does not necessarily reflect the temporal or causal 
order of the message transmissions [Kop 1997]. There is neither a central system clock nor an 
externally generated global pulse or message at the network level. The communication li nk s and 
nodes can behave arbitrarily provided that eventually the system adheres to the protocol 
assumptions (Section 3.4). 

2.1. Drift Rate (/?) 

Each node is driven by an independent, free-running local physical oscillator (i.e., the phase is 
not controlled in any way) and a logical-time clock (i.e., a counter), denoted LocalTimer, which 
locally keeps track of the passage of time and is driven by the local physical oscillator. An 
oscillator tick, also called a clock tick or a system tick, is a discrete value and the basic unit of 
time in the network [Tor 2005]. 

An ideal oscillator has zero drift rate with respect to real-time, perfectly marking the passage of 
time. Real oscillators are characterized by non-zero drift rates with respect to real-time. The 
oscillators of the nodes are assumed to have a known bounded drift rate, p, which is a small 
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constant with respect to real-time, where p is a unitless non-negative real value and is 
constrained to 0 < p « 1 . The maximum drift of the fastest LocalTimer over a time interval of t 
is given by (1 +p)t. The maximum drift of the slowest LocalTimer over a time interval of t is 
given by (\l(\+p))t. Therefore, the maximum relative drift of the fastest and slowest nodes 
with respect to each other over a time interval of t is given by the following equation. 

d(t) = ((\+p)-M{\+p))t (1) 

Although generally p is bounded by 0 < p « 1, in practice, p is assumed to be very close to 
zero. The upper bound on p is extensively discussed in [Mai 201 1 A] where we determined, at 
least for the protocol presented in this report, that p is bounded by 0 < p < 0.3 even for 
theoretical purposes. Nevertheless, in the rest of this report, we’ll use the generally accepted 
bounds for p, i.e., 0 < p« 1. 

2.2. The Logical Clock ( LocalTimer ) 



Figure 1. The, LocalTimer. 

The LocalTimer is driven by the local physical oscillator, takes on discrete values, and locally 
keeps track of the passage of time. As shown in Figure 1, the LocalTimer is a monotonic linear 
function increasing from an initial value to a maximum value. If uninterrupted, the LocalTimer 
periodically takes on all integer values from its initial value, 0, to its maximum value, P, linearly 
increasing within each period, thus, the LocalTimer is bounded by 0 < LocalTimer < P. 

2.3. Communication Delay ( D , d, and f) 



Figure 2. Event-response delay, D, and network imprecision, d. 
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The communication delay between adjacent nodes is expressed in terms of the minimum event- 
response delay, D, and network imprecision, d. These parameters are described with the help of 
Figure 2. As depicted in this figure, a message transmitted by a node at real time to is expected 
to arrive at its directly connected adjacent nodes, be processed, and subsequent messages 
generated by those nodes within the time interval of [to+D, to+D+d]. Communication between 
independently clocked nodes is inherently imprecise. The network imprecision, d, is the 
maximum time difference among all receivers of a message from a transmitting node with 
respect to real time. The imprecision is due to the drift of the oscillators with respect to real 
time, jitter, discretization error, temperature effects and differences in the lengths of the physical 
communication media. These two parameters are assumed to be bounded such that D > 1 and 
d > 0 and both have units of real time clock tick. The communication latency, denoted y, is 
expressed in terms of D and d, and is defined as y = (D+d) and so has units of real time clock 
ticks. Therefore, the communication delay between any two adjacent nodes is bounded by 
[D, y]. 


2.4. Topology (7) 

A communication link, or simply link, is an edge in the graph representing a direct physical 
connection between two nodes. A path is a logical connection between two nodes consisting of 
one or more li nk s. A path-length is the number of links connecting any two nodes. 

The general topology, T, considered is a strongly connected directed graph (digraph) consisting 
of K nodes, where each node is connected to the graph by at least one link, there is a path from 
any node to any other node, and the links are either unidirectional or bidirectional. Furthermore, 
we assume there is no direct link from a node to itself, i.e., no self-loop, and there are no multiple 
links directly connecting any two nodes in any one direction. 

In this report, we use the terms network and graph interchangeably. The following graph 
specific terms are used in the subsequent sections. 

• Two nodes are said to be adjacent to each other or neighbors if they are connected to 
each other via a direct communication li nk . 

• L, an integer value, is the number of li nk s denoting the largest loop in the graph, i.e., the 
maximum value of the longest path-lengths from a node back to itself visiting the nodes 
along the path only once (except for the first node which is also the last node). 

• W, an integer value, is the number of links signifying the width or diameter of the graph, 
i.e., the maximum value of the shortest path connecting any two nodes. 

For digraphs of size K > 1 , L and W are bounded by 2 < L < K and 1 < W< K - 1 . 
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3. The Protocol 


In this section we enumerate protocol assumptions, properties, parameters, and describe the 
protocol in pseudo-code. The general form of the distributed synchronization problem, S, is 
defined by the following septuple [Mai 201 1 A]. 


In other words, the distributed synchronization problem is a function of the number of nodes ( K ), 
network topology ( T ), event-response delay (D), communication imprecision (d), oscillator drift 
rate (/?), synchronization period (P), and number of faults (F), respectively. The solution to this 
problem is a protocol with convergence and closure properties, at a minimum, as discussed 
subsequently in this section. However, in the protocol presented in this report we do not deal 
with faults, thus F= 0. 

Each node is driven by an independent logical-time clock, LocalTimer. The clocks need to be 
periodically synchronized due to their inherent drift with respect to each other. In order to 
achieve synchronization, the nodes communicate by exchanging Sync messages. A node is said 
to time out when its LocalTimer reaches its maximum value. Upon time out, the node generates 
a new Sync message and broadcasts it to others. A node is said to be interrupted when it 
accepts an incoming Sync message before its LocalTimer reaches the maximum value, i.e., 
before it times out. Upon interrupt and except for a predefined window (Section 3.1), the node 
relays the incoming Sync message by broadcasting it to others. 

The periodic time synchronization after achieving the initial synchrony is referred to as the 
resynchronization process whereby all nodes reengage in the synchronization process. The 
resynchronization process begins when the first node times out and transmits a Sync message and 
ends after the last node transmits a Sync message. For p « 1, the fastest node cannot time out 
again before the slowest node transmits a Sync message [Mai 201 1 A]. 

A Sync message is transmitted either as a result of a resynchronization timeout, or when a node 
receives Sync message(s) indicative of other nodes engaging in the resynchronization process. 
The messages to be delivered to the destination nodes are deposited on communication li nk s. 

The following definitions and terms are used in the description and operation of the protocol. 
Figure 3 is used to help with the descriptions. All protocol parameters and the network level 
measurements are real values with time-based terms having units of real time clock ticks. 
However, locally and at the node level, all parameters are discrete. The discretization is for 
practical purposes when implementing the protocol. 


S=(K, TD,d,p,P,F) 
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Figure 3. Resynchronization process and various precisions. 
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• The resynchronization period, denoted P, has units of real time clock ticks and is 
defined as the upper bound on the time interval between any two consecutive resets of the 
LocalTimer by a node. 

• Drift per t, denoted 3(t), has units of real time clock ticks and is defined as the maximum 
amount of drift between any two nodes for the duration of t, d(t) > 0. In particular: 

• Drift per D, denoted 5(D), for the duration of one D, 5(D) > 0. 

• Drift per y, denoted 5(y), for the duration of one y,5(y)> 0. 

• Drift per P, denoted 5(P), for the duration of one period P, 5(P) > 0. 

• The graph threshold, T s , is based on a specified graph topology and has units of real 
time clock ticks. 

• The guaranteed precision or simply precision of the network, denoted n, 0 < n < P, has 
units of real time clock ticks and is defined as the guaranteed achievable precision among 
all nodes. 

• The convergence time, denoted C, has units of real time clock ticks and is defined as the 
bound on the maximum time it takes for the network to converge, i.e., to achieve 
synchrony. 

• Precision between Local Timers of any two adjacent nodes /V, and Nj is denoted by A Vj 
and has units of real time clock ticks. 

• The initial synchrony is a state of the network and the earliest time when the precision 
among all nodes, upon convergence, is within n. The initial synchrony occurs at time 

D l nit' 

• The initial precision among LocalTimers of all nodes is denoted by A/ nit , has units of real 
time clock ticks and, for all t > Cj nit , is defined as a measure of the precision of the 
network immediately after a resynchronization process. 

• The initial guaranteed precision among LocalTimers of all nodes is denoted by 
A initGuaranteed, has units of real time clock ticks and, for all t > C, is defined as a measure of 
the precision of the network immediately after a resynchronization process. 

3.1. The Graph Threshold (T s ) 

When a node receives a Sync message, except during a predefined window, it accepts the Sync 
message, resets its LocalTimer and relays the Sync message by broadcasting it to others. The 
predefined window where the node ignores all incoming Sync messages, referred to as ignore 
window 1 , provides a means for the protocol to stop the endless cycle of resynchronization 
triggered by the follow up Sync messages [Mai 201 1 A]. We bound the ignore window to 
[D, Ts). The lower bound is due to the minimum event-response delay, D, and the upper bound, 
referred to as the graph threshold, T s , is a function of a specified graph topology. 


1 The term refractory period is used in biologically inspired work indicating a brief period of time, following the 
stimulation of a nerve, during which the nerve will not respond to a second stimulus. 
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3.2. Sync Message And Its Validity 


2 

In order to achieve synchrony, the nodes communicate by exchanging Sync messages . When 
the system is in synchrony, the protocol overhead is at most one message per resynchronization 
period P. Assuming physical-layer error detections are dealt with separately, the reception of a 
Sync message is indicative of its validity in the value domain. The protocol performs as intended 
when the timing requirements of the messages from every node are satisfied. However, in the 
absence of faults, the reception of a Sync message is indicative of its validity in the value and 
time domains. A valid Sync message is discarded after it is relayed to the synchronizer (see 
Section 3.3) and has been kept for one local clock tick. 

3.3. The Monitor, The Synchronizer, And Protocol Functions 

A node consists of a synchronizer and a set of monitors. To assess the behavior of other nodes, 
a node employs as many monitors as the number of nodes it is directly connected to with one 
monitor for each source of incoming messages. A node neither uses nor monitors its own 
messages. A monitor keeps track of the activities of its corresponding source node. Specifically, 
a monitor reads, evaluates, validates, and stores the last valid message it receives from that node. 
Upon conveying the valid message to the local synchronizer, a monitor disposes of the valid 
message after it has been kept for one local clock tick. The assessment results of the monitored 
nodes are utilized by the synchronizer in the synchronization process. 

The function ValidateMessage(), Figure 4, used by the monitors determines whether a received 
Sync message is valid. We assume physical-layer error detections are dealt with separately. The 
function ConsumeMessage() used by the monitors invalidates the stored Sync message after it 
has been kept for one local clock tick. The function ValidSync() used by the synchronizer 
examines availability of valid Sync messages. 


V alidateMessageQ : 

if (incoming message = Sync) then 
{Message is valid, 

Store it.} 

ConsumeMessagef): 

if (stored message timer > 1 tick) then 
{Message is expired, 

Clear it.} 

ValidSyncQ: 

if (number of stored messages > 0) then 
{ return time, 
else 

return false.} 


Figure 4. The protocol functions. 


2 Since only one message type is used for the operation of this protocol, a single bit suffices. 
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3.4. Protocol Assumptions 


1. K> 1. 

2. All nodes correctly execute the protocol. 

3. All links correctly transmit data from their sources to their destinations. 

4. T is a non-partitioned, strongly connected digraph. 

5. 0<p«l. 

6. A message sent by a node will be received and processed by its adjacent nodes within y, 
where y = (D + d). 

7. The initial values of the variables of a node are within their corresponding data-type 
range, although possibly with arbitrary values. (In an implementation, it is expected that 
some local mechanism exists to enforce type consistency for all variables.) 


3.5. The Self-Stabilizing Distributed Clock Synchronization Problem 

To simplify the presentation of this protocol, it is assumed that all time references are with 
respect to an initial real time to, where to = 0 and for all t > to the system operates within the 
protocol assumptions . The maximum difference in the value of LocalTimer for all pairs of nodes 
at time t, A^ et (t), is determined by the following equation that accounts for the variations in the 
values of the LocalTimer across all nodes. 

r = [(W+ l)(y+ S(y))~\, 

LocalTimer min (x) = min (N [.LocalTimer (x)) , and 
LocalTimer max (x) = max (Nj.LocalTimer(x)) , for all i. 

ANet(t)= min ((LocalTimer max (t) - LocalTimer min (t)) , 

(LocalTimer max (t - r) - LocalTimer min (t - r))). 


The following symbols were defined earlier and are listed here for reference: 

• P denotes the resynchronization period, has units of real time clock ticks, and is defined 
as the upper bound on the time interval between any two consecutive resets of the 
LocalTimer by a node and P > 0. 

• C denotes a bound on the maximum convergence time. 

• Af/ et (t), for real time t, is the maximum difference of values of the LocalTimers of any two 
nodes (i.e., the relative clock skew) for t > to. 

• 7 r, the synchronization precision, is the guaranteed upper bound on A Net (t) for all t > C, 
and is generally assumed to be very small compared to P. 

To show that a protocol is self-stabilizing, it has to be proven that there exist C and n such that 
the following self-stabilization properties hold. 


1. Convergence: 

2. Closure: 

3. Congruence: 

4. Liveness: 


Aftet( C) < n, 0 < n < P 
For all t > C, A Nel (t) < n 

For all nodes A), for all t > C, {N [.LocalTimer (t) = y) implies A Net (t) < n. 
For all t > C, LocalTimer of every node sequentially takes on at least all 
integer values in \y,P - n\. 
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3.6. The Self-Stabilizing Distributed Clock Synchronization Protocol For Arbitrary 
Digraphs 

The protocol, executed by all nodes, is presented in Figure 5 and consists of a synchronizer and a 
set of monitors which execute once every local clock tick. The if statement describing the 
synchronizer has five parts that are labeled EO through E4 and referenced subsequently in this 
report. 


Synchronizer: 

EO: if ( LocalTimer < 0) 

LocalTimer := 0, 

E 1 : elseif ( ValidSyncQ and {LocalTimer < D)) 
LocalTimer := y, II interrupted 

E2: elseif {{ValidSync() and {LocalTimer > T s )) 
LocalTimer := y, II interrupted 

Transmit Sync, 

E3: elseif {LocalTimer > P) II timed out 
LocalTimer := 0, 

Transmit Sync, 

E4: else 

LocalTimer := LocalTimer + 1 . 


3 

Figure 5. The self-stabilizing clock synchronization protocol for arbitrary digraphs . 

The following is a list of protocol parameters when all li nk s are bidirectional. 

T s >{L+2){y+d(y)) 

P > 3 T s , for p = 0 

P > 3{T S + d(T sj), for L = K and p > 0 

P > max ((2 K + 1)0+ S(y)), 3(T S + S(T s ))), for L =f(T) and/) > 0 

The following is a list of protocol parameters for digraphs, i.e., when at least one link is 
unidirectional. 

T s >{K+2){y+d(y)) 

P>K{Ts + S(T s )) 

Regardless of the types of links in the network, the following is a list of protocol measures. 

C lnit = 2P + K{y+d(y)) 

A Init <{K- 1 ) 0 + s(r)) 
c = C Init + r A Init ly\P 

0 + Aj n 'if( lUaran f CC (i — W{y+ S(y)), for all t > C 

7t Aj n 1 1 Guaranteed + &(P )■ 0 7T ' T. for all / ( 

A trivial solution is when P = 0. Since P > Ts, Ts > 0, and the LocalTimer is reset after reaching 
P (worst-case wraparound), a trivial solution is not possible. 


3 Statement E0 makes explicit the assumption 7 in Section 3.4 (Protocol Assumptions) as it applies to the 
LocalTimer. 


Monitor: 

case (message from the corresponding node) 

{Sync: 

Val idateMessage() 

Other: 

Do nothing. 

} // case 

ConsumeMessageQ 
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4. A Deductive Proof Of The Correctness Of The Protocol 


In this section we present a deductive proof of the correctness of the protocol for the general 
case: a realizable system independent of the network topology where p > 0 and d > 0. Defining 
T s in terms of L requires knowledge of the topology of the given network. Therefore, in order to 
generalize the expression for Ts, make it independent of the topology, and to help simplify the 
proof process, we express it in terms of its worst case value, L = K, which implies that 
Ts > (K+2)(y+ Sf/j). However, for a specific application, optimizing T$ by expressing it in terms 
of L results in faster synchrony and better performance. 

The deductive proof presented in this section is for the protocol presented in previous section 
(Section 3.6) and reported earlier in [Mai 201 1A]. We do not provide separate proofs for the 
variations of the protocol as were discussed in [Mai 201 1 A]. We believe however that this proof 
can readily be extended to all variations of the protocol. 

The proof idea is depicted in Figure 6. The main theorems address the following questions. 
Assuming a Sync message does not get ignored and P is sufficiently large, is it possible for a 
message to circulate within the network without dying out? In other words, will E2 (Figure 5) 4 
get executed indefinitely? Is it possible for a node to transmit Sync messages without ever 
timing out? In other words, will E3 ever get executed? Also, will E4 ever get executed? 



The ranges of the protocol parameters used in the proofs are restated here for reference. 

K > 2 (K = 1 is a simple case and does not need a proof.) 

L = K 
W=K- 1 

0 < p « 1 

\<D<y<Ts<P 
0 < LocalTimer < P 
Ignore window = [D, Ts) 

Ts>(K+2)(y+S(r)) 

P>K(T s + S(T s )) 

We’d like to emphasize that P is defined as the upper bound on the time interval between any 
two consecutive resets by any node in the network, thus, its value is specified with respect to the 
slowest node. In other words, assuming the fastest node and the slowest node start at initial 
synchrony, when the fastest node reaches P, the slowest node is at P - S(P). 


4 Labels E0 through E4 of Figure 5 refer to different parts of the synchronizer. 
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Lemma JoinTimedOut - When a node, Nj, is interrupted by an adjacent node that was timed 
out, Nj, it synchronizes with that node with a relative initial precision of Ay = d + S(D). 

Proof - Given Nj was interrupted at time t by an incoming Sync message from /V„ by the 
protocol (part El, E2 ), Nj sets its LocalTimer while accounting for the message arrival delay, 
i.e., Nj.LocalTimer = y. Given N, had timed out (part E3 of the protocol), it must have reset 
its LocalTimer to 0 and sent the Sync message between D and y time units earlier. 
Accounting for drift, at time t, 

Nj. LocalTimer = 0 + actual communication delay + relative drift to Nj. 

The communication delay is bounded by [D, ;/]. Thus, with Nj as either the slower node or 
the faster node than Nj, 

D - 8(D) < Nj.LocalTimer < y+ S(y). 

Since the relative precision is an upper bound, 

Ajj = max (abs ( Nj.LocalTimer - Nj.LocalTimer)). 

So, 

Ay = (y+ S(y))-y= d(y), and 
Ay = y - {D - 5(D)) = d + d(D). 

Since 3(y) = 8(D) + 8(d) and 0 < p« 1, then d> 8(d), and therefore, 

Ajj = max (8(y), d + 8(D) ) 

Ajj = d+ 8(D). m 

Lemma Joinlnterrupted - When a node, Nj, is interrupted by an adjacent node that was in 
turn interrupted, Nj, it synchronizes with that node with a relative initial precision of 
Ay = y+8(y). 

Proof - Given Nj was interrupted at time t by an incoming Sync message from Nj, by the 
protocol (parts El, E2 ), Nj sets its LocalTimer while accounting for the message arrival 
delay, i.e., Nj.LocalTimer = y. Given Nj was interrupted (parts El, E2 of the protocol), it 
must have set its LocalTimer to y and sent the Sync message between D and y time units 
earlier. Accounting for drift, at time t, 

Nj.LocalTimer = y+ actual communication delay + relative drift to Nj. 

The communication delay is bounded by [D, y\. Thus, with Nj as either the slower node or 
the faster node than Nj, 

y+ D - 8(D) < Nj-LocalTimer < y+y+ 8(y). 

Since the relative precision is an upper bound, 

Ajj = max (abs ( Nj.LocalTimer - Nj-LocalTimer)). 

So, 

Ajj = (2y+ 8(y)) - y= y+ 8(y), and 
Ay = (y+ D - 8(D)) -y=D- 8(D). 
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Since y >D, 

Ay = max (y+ S(y),D - 3(D)) 

Aj = y+3(y). m 

We have defined the ignore window earlier as the predefined time interval (window) where the 
node ignores all incoming Sync messages as a means for the protocol to stop the endless cycle of 
resynchronization triggered by the subsequent Sync messages. From the protocol (parts El, E2 ), 
the ignore window for N, is the time interval of D < N h LocalTimer < Ts. 

Lemma IgnoredNode - When a node, Nj, ignores a Sync message from an adjacent node, Ni, 
the two nodes have a relative initial precision of Aj < Ts -D+3(D). 

Proof - Given, at time t, Nj ignored an incoming Sync message from N h by the protocol, then 
D < Nj.LocalTimer < Ts. Given Ni had sent the Sync message between D and y time units 
earlier, by the protocol (parts E2, E3 ), Ni was either interrupted or had timed out. We address 
these cases separately. 

Case 1 - Given /V, was interrupted, it must have set its LocalTimer to y and sent the Sync 
message between D and y time units earlier (part E2 of the protocol). Accounting for drift, at 
time t, 

N i.LocalTimer = y+ actual communication delay + relative drift to Nj. 

The communication delay is bounded by [D, y\. Thus, with N, as either the slower node or 
the faster node than Nj, 

y+ D - 3(D) < N i.LocalTimer < y+ y+ 3(y). (1) 

Since the relative precision is an upper bound, 

Aj = max (abs ( Nj.LocalTimer - Nj.LocalTimer )) 

Ay < T s - (y+ D - 3(D)). 

Case 2 - Given N had timed out, it must have reset its LocalTimer to 0 and sent the Sync 
message between D and y time units earlier (part E3 of the protocol). Accounting for drift, 
at time t, 

N i.LocalTimer = 0 + actual communication delay + relative drift to Nj. 

The communication delay is bounded by [ D , y\. Thus, with A) as either the slower node or 
the faster node than Nj, 

D - 3(D) < N i.LocalTimer <y+3(y). (2) 

Since the relative precision is an upper bound, 

Aj = max (abs ( Nj.LocalTimer - N j.LocalTimer)) 

Aij < T s - (D - 3(D)). 

Since the relative precision is an upper bound, from the above cases, Aj < T s -D+5(D). m 
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Lemma JoinTimedOutAfterlgnored - When all adjacent nodes are within a relative precision 
of Ay < Ts - D+3(D) clock ticks of each other, for any two adjacent nodes N and Nj with 
Nj.LocalTimer < Nj.LocalTimer, ifNj times out, Nj will join Nj with a relative initial precision 
of Aij = d + 3(D). 

Proof - Given at time t the relative precision of Ay and N t lagging Nj by Ay, since 0 < p « 1, 
D » 3(D), Ay < Ts, and so P - Ay > P - Ts. 


P > KT S + K3(T s), subtracting T s from both sides, 

P-T s > (K-\)Ts + KS(Ts), and so, 

P- A lJ >(K-\)Ts+ KS(Ts). 

At time t’ = t + (P - Ay), when Nj times out, Ni.LocalTimer is detennined as follows. For the 
worst case analysis, we assume N is the slower node than Nj. Also, even though P-Ay<P, 
i.e., 3(P - Ay) < S(P), we set 3(P - Ay) = S(P) to simplify the algebraic argument. 
Ni.LocalTimer(t) = Nj.LocalTimer (t) + (P - Ay) - 3(P - Ay) 

Nj.LocalTimer (t) = Nj.LocalTimer (t) + (( K-\)Ts + K3(Ts)) - 3(P) 

Since 3(P) = K3(T S ), 

Nj.LocalTimer(t) = Nj.LocalTimer (t) + (K-l)Ts 

From inequalities (1) and (2) of the proof of Lemma IgnoredNode, and since we consider for 
the worst case analysis N being the slower node than Nj, 

Nj.LocalTimer(t) = D - 3(D), and 
Nj.LocalTimer(t’) > D - 3(D) + (K-l)T s . 

Since 0 <p« 1 ,D» 3(D), and for all K> 2, 

Nj.LocalTimer (t) > Ts. 

Thus, when Nj times out, Nj.LocalTimer > Ts and A, accepts N/s message. By the Lemma 
JoinTimedOut the two adjacent nodes N and Nj synchronize with each other with a relative 
initial precision of Ay = d + 3(D) clock ticks. ■ 

Lemma JoinlnterruptedAfterlgnoredl - When at time t all adjacent nodes are within a 
relative precision of Ay < Ts - D+3(D) clock ticks of each other, for any two adjacent nodes 
Nj and Nj with Nj.LocalTimer < Nj.LocalTimer, if Nj gets interrupted by another node at 
t’ > t+Ts+3(T s), N will join Nj with a relative initial precision of Ay = y + 3(y). 

Proof - Given the relative precision of Ay and Nj lagging Nj by Ay, at time t, for the worst 
case analysis, we assume Nj is the slower node than Nj. At time t’ = t + T s + 3(T s ), 
Nj.LocalTimer(t’) = Nj.LocalTimer(t) + Ts + 3(Ts) 

Nj.LocalTimer(t’) = Nj.LocalTimer(t) - 3(Ts), and so, 

Nj.LocalTimer (t) = Nj.LocalTimer (t) + Ts 

From inequalities (1) and (2) of the proof of Lemma IgnoredNode, and since we consider for 
the worst case analysis Nj being the slower node than Nj, 

Nj.LocalTimer (t) = D - 3(D), and 
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Ni.LocalTimer(t’) >D- 3(D) + Ts. 


Since 0 <p « 1, D » 3(D), therefore, 

Nj.LocalTimer(t’) > Ts. 

Thus, when Nj times out, Nj.LocalTimer > T s and /V, accepts N/s message. By the Lemma 
Joinlnterrupted the two adjacent nodes /V, and Nj synchronize with each other with a relative 
initial precision of Ay = y+ 3(y) clock ticks. ■ 

Lemma JoinInterruptedAfterIgnored2 - When all adjacent nodes are within a relative 
precision of Ay < T$ - D+3(D) clock ticks of each other, for any two adjacent nodes Nj and Nj 
with Ni.LocalTimer < Nj.LocalTimer, ifNj gets interrupted by another node before either Nj 
times out or gets interrupted by yet another node, Nj will join N with a relative initial 
precision of Ay = y+ 3(y). 

Proof - Given the relative precision of Ay and N lagging Nj by Ay and N gets interrupted by 
another node (other than Nj), by the protocol (part E3), T s < Ni.LocalTimer < P, i.e., /V, had 
exited the ignore window. Since N t lags Nj, Ts < Nj.LocalTimer < P. By the Lemma 
Joinlnterrupted the two adjacent nodes N t and Nj synchronize with each other with a relative 
initial precision of Ay - y+ 3(y) clock ticks. ■ 

Note in the general form of the above lemma if either of the two nodes times out, the other one 
will follow and Ay = d + 3(D). However, since Ni.LocalTimer < Nj.LocalTimer, if N times out 
before Nj, it implies that N is a considerably faster node and the amount of drift in the system is 
very large; 3(P) > Ts. 

It follows from the protocol that an incoming Sync message to a node either gets ignored or gets 
accepted and subsequently relayed to other nodes. A Sync message is said to die out when all 
receiving nodes ignore it. Assuming the message does not get ignored and P is sufficiently large, 
is it possible for a Sync message to circulate within the network without ever dying out? In other 
words, will E2 get executed indefinitely? Is it possible for a node to transmit Sync messages 
without ever timing out? In other words, will E3 ever get executed? How about E42 The 
following lemmas address these questions. 

Lemma NoInfiniteLoop - A Sync message always dies out. 

Proof - By the protocol (parts El, E2 ) a message dies out if it gets ignored by all receiving 
nodes ( D < Ni.LocalTimer < Ts), otherwise, it gets relayed to other nodes. For a message to 
persist it has to circulate within the network forever, i.e., there needs to be a loop in the 
network. By definition, L signifies the size of the largest loop in the network. Let all nodes 
in a loop accept a Sync message when they receive it, relay the incoming Sync message, and 
no new messages are generated due to a time out. Let, at time t, Nj be the first node in that 
loop that just got interrupted by another node. So, Ni.LocalTimer = y and N t will relay the 
Sync message to other nodes inside the loop. At time t \ this message will make it back to N, 
after traversing the loop. Accounting for drift and assuming Nj is the fastest node, at time t\ 
Ni.LocalTimer = y+ Ly+ L 3(y), 

Nj.LocalTimer = (L+ 1 )y+ L 3(y). 
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By the protocol (parts El, E2), N, will ignore the message if D < Nj.LocalTimer < Ts. At 
time t ’ for N h 

D<{L+ 1 )y+ L S(y), since L >\,y>D, and L 8(y) > 0, and 
(L + \)y+ L 8(y) < T s 
( L + 1 )y+ L 8(y) < ( K + 2 )y+ (K+ 2) S(y). 

Since L<K, substituting K for L in the above inequality, we get the following. 

(K + 1 )y+ S(y)K <(K+ 2 )y+ (K+ 2) 8(y) 

0 < y+ 2 8(y). 

Since y> 1 and 2 8(y) > 0, the above inequality holds true. Thus, D < Ni.LocalTimer < Ts at 
time t’ and /V, will ignore the message. Therefore, the message will die out even if it reaches 
its original source. As a result, it is impossible for a message to persist within the network 
indefinitely. ■ 

Lemma MessageLifeSpan - A Sync message dies out in at most K(y+S(y)) clock ticks. 

Proof - By the Lemma NoInfiniteLoop , a Sync message always dies out. It also follows that 
if there is a loop in the graph, it takes at most L(y + S(y)) clock ticks for a message to reach 
its source and then die out. By the definition of W, it takes at most W(y+ dfyjj clock ticks for 
a message to reach all other nodes. Once again, by the Lemma NoInfiniteLoop, this Sync 
message always dies out. Since L < K and W < K- 1, for the worst case analysis we choose 
the largest value for L and W, i.e., K. Thus, it takes at most K(y+S(y)) clock ticks for a Sync 
message to die out. ■ 

Lemma IncLocalTimer - The LocalTimer of at least one node always reaches P. 

Proof - By the Lemma NoInfiniteLoop a Sync message always dies out. Also, by the Lemma 
NoInfiniteLoop unless new Sync messages are generated, all existing Sync messages in the 
network will eventually die out. In the absence of a Sync message circulating within the 
network, by the protocol (part E4), the LocalTimer gets incremented until it reaches its 
maximum value P. Therefore, the LocalTimer of at least one node always reaches P. m 

Lemma NewSync - Within every time interval of P clock ticks, at least one node generates a 
new Sync message. 

Proof - By the Lemma NoInfiniteLoop a Sync message always dies out. By the Lemma 
MessageLifeSpan a Sync message dies out in K(y+S(y)) clock ticks. Since K(y+6(y)) < P, by 
the Lemma IncLocalTimer at least one node reaches P. By the protocol (part E3), once a 
node reaches P, it times out and generates a new Sync message. Therefore, within every P at 
least one node always times out and generates a new Sync message. ■ 

Lemma AllNodesTxSync - Within every time interval of P clock ticks, every node N 
transmits at least one Sync message. 

Proof - We prove this lemma by examining all activities along the time line. 

By the protocol (part El), when Ni.LocalTimer < D, if Ni does not receive a Sync message, its 
LocalTimer gets incremented (part E4), reaches D, and the node enters the ignore window. 
However, if /V, receives a Sync message, it accepts the message without relaying it. In this 
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case, the node sets its LocalTimer to y and since y > D this event occurs only once and the 
node enters the ignore window. 

By the protocol while in the ignore window (parts El, E2), D < Ni.LocalTimer < Ts, the node 
rejects all incoming Sync messages and the LocalTimer gets incremented (part E4), reaches 
T s , and the node exits the ignore window. 

By the protocol after exiting the ignore window (part E3), Ts < N j. LocalTimer < P, if Nj 
receives a Sync message from another node, it will accept and relay it, otherwise, its 
LocalTimer gets incremented (part E4), reaches its maximum value P, and the node times 
out, generates a new Sync message, and transmits it to others. 

Therefore, within any P clock ticks, every node Ni transmits at least one Sync message. ■ 

Lemma DeltaijLessThanTs - For all t > P+y clock ticks the relative initial precision of any 
two adjacent nodes Ni and Nj is Aj < Ts- D+d(D). 

Proof - By the Lemma AllNodesTxSync, within every time interval of P clock ticks every 
node transmits at least one Sync message. Accounting for the message processing and 
delivery time, i.e., y, by the Lemmas JoinTimedOut, Joinlnterrupted, and IgnoredNode, the 
relative initial precision of any two adjacent nodes Ni and Nj within P+y, is d + 8(D), y + 8(y), 
or less than Ts - D+8(D) clock ticks, respectively. Since the relative precision is an upper 
bound, therefore, Ay <Ts~ D+S(D). m 

Lemma PrecisionLessThanWTs - For all t > P+y clock ticks the initial network precision is 
A Ne t(t) < W(T S - D+8(D)). 

Proof - By the Lemma DeltaijLessThanTs, for all t > P+y clock ticks the relative initial 
precision of any two adjacent nodes is Ay < Ts - D+8(D). Since the initial precision of the 
network is an upper bound, by the definition of W, at t = P+y, A Net (t) is the sum of all Ay, i.e., 
A Net (t) = WAy. Thus, A Ne ,(t) < W(Ts - D+8(D)). m 

Lemma DeltaijAboutGamma - For all t > 2P+y clock ticks and upon the subsequent 
resynchronization process, the relative initial precision of any two adjacent nodes Ni and Nj 
is Ay = y+8(y). 

Proof - By the Lemma PrecisionLessThanWTs, for all t > P+y clock ticks the initial network 
precision is A Net (t) < W(Ts - D+8(D)). Within another P, i.e., at t = 2 P+y, the A Net (t) increases 
by the maximum drift for the duration of P, i.e., S(P). So, 

A Net (t) < W(T S - D+8(Dj) + 8(P). 

With W<K- 1, we set W=K- 1 for the worst case analysis, and since S(P) =KS(T s ), 

A Ne ,(t ) < (K-\ )(Ts - D+8(D )) + KS(Ts). 

Since P>KT S + K8(T S ), 

P - A Ne ,(t) > (j KT S + KS(Ts)) - ((K- \ )(T S -D+S(D)) + K8(T S )), and so, 

P - A Ne ,(t) >T S + (K-\)(D - 8(D)). 
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Since 0 <p« 1 ,D» 3(D) and P - A Ne ,(t) > Ts+ (K - 1 )D. 


Thus, at t = 2 P+y, P - A Net (t) > Ts. By the Lemma NoInfiniteLoop a Sync message always 
dies out. By the Lemma IncLocalTimer, during every P time interval, at least one node 
reaches P and by the Lemma NewSync, a new Sync message always gets generated. At 
t > 2 P+y and upon the subsequent resynchronization process, by the Lemmas 
J()inTimedOutAfter Ignored, JoinlnterruptedAfterlgnoredl , and Join I nterruptedAfter I gnored2 
all nodes synchronize with their adjacent nodes with relative initial precision of any two 
adjacent nodes as d + 3(D), y + 3(y), and y+ 3(y) clock ticks, respectively. Since the initial 
precision is an upper bound, thus Ay = y+ 3(y). u 

Lemma InitialPrecision - For all t > Cmu where Ci„u = 2P+K(y+3(y)) clock ticks, and upon 
the subsequent resynchronization process, the initial network precision is 
A m = (K-l)(y+3(y)). 

Proof - By the Lemma DeltaijAboutGamma, for all t > 2P+y and upon the subsequent 
resynchronization process, Ay = y+ 3(y). By the definition of W, it takes at most W(y+3(y)) 
clock ticks for a message to reach all other nodes. Since W<K- 1, for the worst case analysis 
we set W = K-l. Since the initial precision of the network is an upper bound, A Net (t) is the 
sum of all Ay, i.e., A Net (t) = WAy = (K - 1 )(y+3(y)). Therefore, A Ne ,(t) = (K - 1 ){y+ 3(y)), and so, 
A im , = (K-\)(y+ 3(y)). m 

Theorem InitialConvergence - For all t > Cmu the network converges to a state where the 
guaranteed network precision is n = Ami + 3(P). 

Proof - By the Lemma DeltaijAboutGamma, for all t > 2P+y and upon the subsequent 
resynchronization process, Ay = y+3(y). By the Lemma InitialPrecision, for all t > Cmi and 
upon the completion of the subsequent resynchronization process, A/ nit = (K-\)(y+ 3(yj). 
However, due to drift, A] nit will increase over a time interval of P by a maximum amount of 
3(P). Therefore, for all t > Cmu the guaranteed precision of the network is 
n = A/ fUt + maximum drift in the network over P 
n = Am + S(P). 


Thus, for all t > Cmu the network converges with n = A/ nit + 3(P) and remains within n. m 

Corollary InitialConvergenceTime - The initial convergence time is Cmu 

Proof - By the Theorem InitialConvergence the network converges within Cmu ■ 


Corollary InitGuaranteedPrecision - For all t > Ci, u , and upon the completion of the 
subsequent resynchronization process, the initial guaranteed precision of the network is 


AlnitGuaranteed " Amu 


Proof - By the Theorem InitialConvergence the network converges within C/ nit with an initial 
precision of Amu It also follows that upon the completion of the subsequent 
resynchronization processes the initial guaranteed precision of the network for all t > Cmi is 


AlnitGuaranteed Alnit- 
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Theorem InitialClosure - For all t > Cmu a synchronized network where all nodes have 
converged to it = A Itlit + S(P), shall remain within the synchronization precision it. 

Proof - By the Lemma InitialPrecision, for t > Cj nit , A lnit = (K-\ )(y+8(y)). By the Theorem 
InitialConvergence for t > Ci n u the network has converged with guaranteed network precision 
it = Ami + S(P) while accounting for the maximum drift over a time interval of P. By the 
Theorem InitialConvergence and upon subsequent resynchronization processes, the network 
precision will remain within it. Thus, the network has converged, is synchronized, and 
remains synchronized within n. m 

Theorem InitialNetworkPrecision - For all t > Cmu where C/ nit = 2P+K(y+S(y)) clock ticks, 
the network precision is it = A Mt + 8(P). 

Proof- By the Lemma InitialConvergence and Theorem InitialClosure, for all t > Cmu 
7 z = A Init + d(P). m 

Theorem InitialCongruence - For all nodes Ni and for all t > Cmu (Ni.LocalTimer(t) = y) 
implies A Ne ,(t) < it. 

Proof - By the protocol (part E3) upon time out a node resets its LocalTimer to zero. By the 
Lemma IncLocalTimer the LocalTimer of a node will always get incremented and unless 
interrupted a timed out node will reach y. By the protocol (parts El, E2 ) when a node gets 
interrupted, it resets its LocalTimer to y. Thus, a node will always reach y. By the Theorems 
InitialConvergence and InitialClosure, for all t > Cmu the network convergences and remains 
synchronized with rc = A hu , + d(P). Therefore, for all t > C\ mt and 0 < p « 1, when 
Ni.LocalTimer(t) = y the network precision remains within n, i.e., A Net (t) < it, and all nodes 
are in synchrony. ■ 

The proof presented thus far is based on very conservative measures and has demonstrated the 
correctness of the protocol even when the network parameters are considered at their boundary 
(minimum and maximum) values. However, upon the initial convergence and closure, there are 
a number of ways to achieve tighter precision. One such method is introduced in [Mai 2006B], 
where, in a two step process, another proven correct protocol that is based on the initial 
synchrony assumptions is used to achieve the optimum precision of the coarsely 
synchronized system. Another method is by adding a few random links or rewiring links with 
a certain probability to provide shortcuts between different segments of a graph [Wat 1998] [Gad 
2000][Bar 2002][Hon 2002, 2004][Li 2004][Gom 2007]. These ideas, as discussed in [Mai 
2011 A], are primarily used to achieve convergence, but can also be used to achieve tighter 
precision. There is yet another method, presented below, that simply requires more time. 
Performance of this method depends primarily on the drift rate. 

From the expression for A] mt = (K-l)(y +S(y)) it is evident that the synchronization time, C, and 
precision, it, are also functions of the graph topology and the drift rate, specifically, the graph’s 
width and the amount of drift the network experiences. In other words, 

C=f(T,p)=f(W,5(P)), and 
it=f(T,p)=f(W,8(P)). 
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Note that the general equation for Aj nit encompasses the ideal and semi-ideal scenarios. In 
particular, for the ideal scenario, where p = 0 and d— 0, and for the semi-ideal scenario, where 
p = 0 and d> 0, A/ mt = (K-\ )y. 

Thus far and in the proof process, we assumed 0 < p « 1 . The proof, however, holds when this 
bound is less restricted, i.e., 0 < p < 1. When p » 0 such that S(y) > y, by the Theorem 
InitialConvergence, for all t > Cj nit and upon subsequent resynchronization processes, the 
network converges with 7r = A Init + S(P). In other words, due to the high rate of drift, no further 
improvement on A/ nit and the network precision, it, can be guaranteed. Thus, when p » 0, 
AfniiGuarcmteed < A /nit , and no further improvement on Aj mt is achieved; therefore, no improvement 
on 7r can be guaranteed. Furthermore, the convergence time C is bounded by C/ nit , i.e., C = Cj nit . 

However, if p « 1, although the initial (coarse) synchrony, A lnit , occurs within C Ini t, the initial 
guaranteed synchrony, A InitGuarantee( j, takes place after a number of periods and upon achieving the 
initial synchrony, i.e., Ai nit . We demonstrate this by the following lemmas for the semi-ideal 
scenarios where p = 0, d>0, and y = D + d. 

Lemma pdJoinTimedOut - For p - 0, when a node, Nj, is interrupted by an adjacent node 
that was timed out, N, it synchronizes with that node with a relative initial precision 
of A] = d. 

Proof- By the Lemma JoinTimedOut, Ajj = d + S(D). Since p = 0 and d> 0, Ay — d. m 

Lemma pdJoinlnterrupted - For p = 0, when a node, Nj, is interrupted by an adjacent node 
that was in turn interrupted, Nj, it synchronizes with that node with a relative initial precision 
of Ay = y 

Proof- By the Lemma Joinlnterrupted, Ay =y+ S(y). Since p = 0 and d> 0, Ay = y. m 

Lemma pdDeltaijAboutGamma - For p = 0, for all t > 2P+y clock ticks, the relative initial 
precision of any two adjacent nodes N and Nj is Ay = y. 

Proof- By the Lemma DeltaijAboutGamma, Ay = y+S(y). Since p = 0 and d> 0, Ay = y. m 

Lemma pdlnitialPrecision - For p = 0, for all t > Ci nit , where Ci n u = 2P+Ky clock ticks, and 
upon the subsequent resynchronization process, the initial network precision is A Init = (K-l)y. 
Proof- By the Lemma Initia/Precision, for all t > Ci ni t, A Mt = (K - 1 )(y+ S(yj). Since p = 0 
and d> 0, A Init = (K- 1 )y. m 


Lemma pdlnitGuaranteedPrecision - For p = 0 and for all t > C, the initial guaranteed 
precision is A InitGuar anteed = Wd. 

Proof - By the Theorems InitialConvergence and InitialClosure, the network has converged, 
is synchronized, and remains synchronized for all t > C Ini t and upon subsequent 
resynchronization processes, the network converges to A Net (t) = A Irit + S(P). Since p - 0, 
A N Jt) = Ai„ it . By the Lemma pdDeltaijAboutGamma for all t > C Ini t, all adjacent nodes are 
within Ay = y of each other. However, since p = 0 and d > 0, for all t > C Init and upon 
subsequent resynchronization processes, by the Lemma pdJoinTimedOut at least two 
adjacent nodes will synchronize with Ay = d. Thus, at least one node per resynchronization 
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process, i.e., per P, will synchronize with its adjacent nodes with Ay = d. Repeating this 
process for \ A Init ly \ = W periods, will result in a synchronized network with a precision of 
Ay = d for any two adjacent nodes N, and Nj and an initial guaranteed precision of 
AinitGuaranteed = Wd for the entire network. ■ 

Corollary pdPrecision - For p = 0, the precision of the network is n = Wd. 

Proof - By the Lemma pdlnitGuaranteedPrecision, for p = 0, the network converges to 

AinitGuaranteed hVd. Recall that 7 T AinitGuaranteed "t" h{ C ) . Since p 0 and d ( ) . 7T 1 1 tl . ■ 

Corollary pdConvergenceTime - For p = 0, the convergence time is C = C/ nit + rAi nit /ylP. 
Proof - By the Lemma pdlnitGuaranteedPrecision, for p = 0, the network converges to 

AinitGuaranteed Wd within C Clnit T T Ai n it / y \ P . M 


Note that when d = 0, i.e., for the ideal scenario, y = D, AinitGuaranteed = 0, and n = 0. We now 
state the general lemmas and theorems that encompass all scenarios, i.e., for 0 < p « 1 and d> 0 
and for all t > C. 

Theorem Convergence - For all t > C, the network converges to a state where the 
guaranteed network precision is n, i.e., A Ne ,(t) <n. 

Proof - By definition, for all t > C, n = AinitGuaranteed + S(P). By the Theorem 
InitialConvergence, for all t > Ci mt , n = Ai nit + S(P). For 0 <p« 1, it also follows that upon 
the completion of subsequent resynchronization processes, i.e., for all t > C where C > C/ mt , 
the initial guaranteed precision of the network is Ai nit and upon subsequent resynchronization 
processes, no further improvement on Ai nit can be guaranteed. Therefore, no improvement on 
AinitGuaranteed can be achieved. Also, by the Corollary InitGuaranteedPrecision, 
AinitGuaranteed = A /mt and so no improvement on 7r can be guaranteed. Also, since the 
maximum drift for a time interval of P is bounded by S(P), the network has converged with 
precision n and remains within n. m 

Theorem Closure - For all t >C, a synchronized network where all nodes have converged to 
Afieft) < n, shall remain within the synchronization precision n. 

Proof - By the Theorem Convergence, for all t > C, the network converges with the precision 
7 r = AinitGuaranteed + S(P). It also follows that lor 0 < p « 1 and for all t > C, upon subsequent 
resynchronization processes the network precision remains within k. Therefore, the network 
has converged, is synchronized, and remains synchronized within n. ■ 

Theorem Congruence - For all nodes Ni and for all t > C, (NuLocalTimer(t) = y) implies 
A Ne t(t) < n. 

Proof- By the protocol (part E3 ) upon time out a node resets its LocalTimer to zero. By the 
Lemma IncLocalTimer the LocalTimer of a node will always get incremented and, unless 
interrupted, a timed out node will reach y. By the protocol (parts El, E2) when a node gets 
interrupted, it resets its LocalTimer to y. Thus, a node will always reach y. By the Theorems 
Convergence and Closure, for all t > C, the network convergences and remains synchronized 
with 7 r = AinitGuaranteed + h(P). Thus, for all t > C and 0 < p « 1, when Nj. Local T imer(t) = y, 
the network precision remains within k, i.e., A Net (t) < n, and all nodes are in synchrony. ■ 
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Lemma InitGuaranteedPrecision - For all t > C, the initial guaranteed precision of the 
network is Wd < A InitGuaranteed < (K-l)(y+S(y)), where A InitGuaranteed = Wd, for p = 0, and 

A/n it Guaranteed — (K - l ) (y blyjj, f()t' p > 0. 

Proof- By the Lemma Initia/Precision, for 0 < p « 1 and for all t>Ci n u and upon 
the subsequent resynchronization process, the initial network precision is 
Ai n i t = (K - 1 )(y +S(y)). By the Theorem InitialConvergence, for 0 < p « 1 and t > Ci„ it and 
upon the completion of the subsequent resynchronization processes, the initial guaranteed 
precision of the network is A Init , in other words, A InitGuaranteed = A Init . By the Lemma 
pdlnitGuaranteedPrecision, for p = 0 and for all t > C, the initial guaranteed precision is 
AinitGuaranteed = Wd. Thus, for 0 < p « 1 and for all t > C, the initial guaranteed precision is 
bounded by Wd < A lnitGuaranteed <(K - 1 )(y+S(y)). m 


Theorem GuaranteedPrecision - For for all t >C, the guaranteed precision of the network is 
bounded by Wd<n< A InitGuaran , eed + S(P). 

Proof - By the Lemma InitGuaranteedPrecision, for all t > C and 0 < p « 1, the initial 
network precision is bounded by Wd < A InitGua ranteed < (K - 1 )(y+S(y)) with the lower bound 
lor p = 0 and the upper bound for p> 0. By definition, for all t > C, n = A InitGuaranteed + S(P). 
Therefore, for 0 < p « 1 and for all t > C, the guaranteed precision of the network is 
bounded by Wd < n < A InitGuar anteed + S(P) . m 


Note that when d = 0, i.e., for the ideal scenario, y = D, Ai nitGua ranteed = 0, and n = 0. 

Lemma ConvergenceTime - The convergence time is C = C] nit + [ A/ nit /y\ P. 

Proof - By the Theorem InitialConvergence, for 0 < p « 1, the network converges within 
Ci n it. By the Corollary pdConvergenceTime, for p = 0, the convergence time is C = C !nit + 
\ Ai n i t ly \ P. Since the convergence time is an upper bound, lor 0 < p « 1, the convergence 
time is C = Ci„u + I A Init l y\ P. m 


Theorem Liveness - For all t >C, LocalTimer of every node sequentially takes on at least all 
integer values in [y P - it]. 

Proof - By the Theorems Convergence and Closure, for all t > C, a synchronized network 
where all nodes have converged to n, remains within n. Since the network is synchronized, 
all nodes either time out or get interrupted by a timed out adjacent node within n. For the 
worst case analysis, when the fastest node reaches P, times out and transmits a new Sync 
message, the slowest node is at P - n. If the slowest node is adjacent to the fastest node, it 
gets interrupted within the next y. If the slowest node is farthest away from the fastest node, 
it times out within n. If the slowest node gets interrupted before reaching P, by the protocol 
(part El, E2 ), it sets its LocalTimer while accounting for the message arrival delay, i.e., 
LocalTimer = y. For the worst case analysis of the liveness property, the slowest node is 
adjacent to the fastest node. Since the network is in synchrony, the slowest node will not get 
interrupted again until a fastest node times out within the next P. Therefore, the slowest node 
sequentially takes on all integer values in \y, P - ?r]. The fastest node, however, sequentially 
takes on all integer values in [0, P]. Furthermore, upon convergence this process repeats 
during every P time interval and after every resynchronization process. Thus, for all t > C, 
all nodes sequentially take on at least all integer values in \y, P - tz\. m 
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5. Conclusions 


In this report, we presented a deductive proof of the correctness of a self-stabilizing distributed 
clock synchronization protocol that is focused on the distributed synchronization of an arbitrary, 
fault- free, and non-partitioned digraph ranging from fully connected to 1 -connected networks of 
nodes while allowing for differences in the network elements. We presented a deductive proof 
of the correctness of the protocol as it applies to the networks with unidirectional and 
bidirectional links. We also confirmed the claims of determinism and linear convergence. This 
protocol does not rely on assumptions about the initial state of the system and no central clock or 
centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not 
have unique identities. There is no theoretical limit on the maximum number of participating 
nodes. The only constraint on the behavior of the node is that the interactions with other nodes 
are restricted to defined links and interfaces. 

We have shown and proven how to synchronize an arbitrary digraph in the absence of faults. 
This effort brought up the following questions. Can an arbitrary digraph be synchronized in the 
presence of faults? What types of faults can an arbitrary digraph tolerate? Looking at the 
problem from a different perspective, if the faults are symmetric, what types of graphs can 
synchronize in their presence? What if the faults are asymmetric (Byzantine), what types of 
graphs can synchronize in their presence? 
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Appendix A. Symbols 

The symbols used in the protocol are described in detail in [Malekpour 2010] and are listed here 
for reference. 


Symbols 

Descriptions 

K 

T 

D 

d 

P 

P 

F 

sum of all nodes 
network topology 
event-response delay 
network imprecision 

bounded drift rate with respect to real time 
self-stabilization/ synchronization period 
sum of all faulty nodes 

Nt 

Mi 

7 

L 

W 

T s 

n 

C 

C/nit 

LocalTimer 

At 

A/nit 

the i th node 

the i th monitor of a node 
communication latency 
the largest loop in the graph 
the width or diameter of the graph 
graph threshold 

the guaranteed self-stabilization/synchronization precision 

convergence time 

time of initial synchrony 

node’s local logical clock 

precision between LocalTimers of any two adjacent nodes A, and Nj 

initial precision among LocalTimers of all nodes immediately after a 

resynchronization process 

Ain it Guar an teed 

initial guaranteed precision among LocalTimers of all nodes immediately after a 
resynchronization process 

S(0 

Sync 

^NetO) 

drift per t 

self-stabilization/synchronization message 
precision among LocalTimers of all nodes at time t 
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